SECOM: A Novel Hash Seed and Community Detection Based-Approach for Genome-Scale Protein Domain Identification
نویسندگان
چکیده
With rapid advances in the development of DNA sequencing technologies, a plethora of high-throughput genome and proteome data from a diverse spectrum of organisms have been generated. The functional annotation and evolutionary history of proteins are usually inferred from domains predicted from the genome sequences. Traditional database-based domain prediction methods cannot identify novel domains, however, and alignment-based methods, which look for recurring segments in the proteome, are computationally demanding. Here, we propose a novel genome-wide domain prediction method, SECOM. Instead of conducting all-against-all sequence alignment, SECOM first indexes all the proteins in the genome by using a hash seed function. Local similarity can thus be detected and encoded into a graph structure, in which each node represents a protein sequence and each edge weight represents the shared hash seeds between the two nodes. SECOM then formulates the domain prediction problem as an overlapping community-finding problem in this graph. A backward graph percolation algorithm that efficiently identifies the domains is proposed. We tested SECOM on five recently sequenced genomes of aquatic animals. Our tests demonstrated that SECOM was able to identify most of the known domains identified by InterProScan. When compared with the alignment-based method, SECOM showed higher sensitivity in detecting putative novel domains, while it was also three orders of magnitude faster. For example, SECOM was able to predict a novel sponge-specific domain in nucleoside-triphosphatase (NTPases). Furthermore, SECOM discovered two novel domains, likely of bacterial origin, that are taxonomically restricted to sea anemone and hydra. SECOM is an open-source program and available at http://sfb.kaust.edu.sa/Pages/Software.aspx.
منابع مشابه
A Novel Intelligent Fault Diagnosis Approach for Critical Rotating Machinery in the Time-frequency Domain
The rotating machinery is a common class of machinery in the industry. The root cause of faults in the rotating machinery is often faulty rolling element bearings. This paper presents a novel technique using artificial neural network learning for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (harmmean and median), whic...
متن کاملA new technique for bearing fault detection in the time-frequency domain
This paper presents a new Fast Kurtogram Method in the time-frequency domain using novel types of statistical features instead of the kurtosis. For this study, the problem of four classes for Bearing Fault Detection is investigated using various statistical features. This research is conducted in four stages. At first, the stability of each feature for each fault mode is investigated. Then, res...
متن کاملFault Type Estimation in Power Systems
This paper presents a novel approach for fault type estimation in power systems. The Fault type estimation is the first step to estimate instantaneous voltage, voltage sag magnitude and duration in a three-phase system at fault duration. The approach is based on time-domain state estimation where redundant measurements are available. The current based model allows a linear mapping between the m...
متن کاملA Novel Technique for Steganography Method Based on Improved Genetic Algorithm Optimization in Spatial Domain
This paper devotes itself to the study of secret message delivery using cover image and introduces a novel steganographic technique based on genetic algorithm to find a near-optimum structure for the pair-wise least-significant-bit (LSB) matching scheme. A survey of the related literatures shows that the LSB matching method developed by Mielikainen, employs a binary function to reduce the numbe...
متن کاملDesign of nonlinear parity approach to fault detection and identification based on Takagi-Sugeno fuzzy model and unknown input observer in nonlinear systems
In this study, a novel fault detection scheme is developed for a class of nonlinear system in the presence of sensor noise. A nonlinear Takagi-Sugeno fuzzy model is implemented to create multiple models. While the T-S fuzzy model is used for only the nonlinear distribution matrix of the fault and measurement signals, a larger category of nonlinear systems is considered. Next, a mapping to decou...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره 7 شماره
صفحات -
تاریخ انتشار 2012